UDBRNet: A novel uncertainty driven boundary refined network for organ at risk segmentation

Organ segmentation has become a preliminary task for computer-aided intervention, diagnosis, radiation therapy, and critical robotic surgery. Automatic organ segmentation from medical images is a challenging task due to the inconsistent shape and size of different organs. Besides this, low contrast at the edges of organs due to similar types of tissue confuses the network’s ability to segment the contour of organs properly. In this paper, we propose a novel convolution neural network based uncertainty-driven boundary-refined segmentation network (UDBRNet) that segments the organs from CT images. The CT images are segmented first and produce multiple segmentation masks from multi-line segmentation decoder. Uncertain regions are identified from multiple masks and the boundaries of the organs are refined based on uncertainty data. Our method achieves remarkable performance, boasting dice accuracies of 0.80, 0.95, 0.92, and 0.94 for Esophagus, Heart, Trachea, and Aorta respectively on the SegThor dataset, and 0.71, 0.89, 0.85, 0.97, and 0.97 for Esophagus, Spinal Cord, Heart, Left-Lung, and Right-Lung respectively on the LCTSC dataset. These results demonstrate the superiority of our uncertainty-driven boundary refinement technique over state-of-the-art segmentation networks such as UNet, Attention UNet, FC-denseNet, BASNet, UNet++, R2UNet, TransUNet, and DS-TransUNet. UDBRNet presents a promising network for more precise organ segmentation, particularly in challenging, uncertain conditions. The source code of our proposed method will be available at https://github.com/riadhassan/UDBRNet.


Introduction
Robotic surgery, computer aided diagnosis, targeted radiation therapy require meticulous segmentation of affected organ from adjacent organs [1][2][3][4][5].The authors of [6] examined the evolution of automatic multi-organ segmentation techniques, comparing traditional methods with deep learning approaches and found that deep learning methods consistently outperformed traditional approaches, indicating their superior efficiency in segmentation tasks.
However, despite their success, deep learning models encounter challenges in complex environments [7].
Abdominal organs are difficult to segment due to overlap, inconsistent shape, and uneven size [7][8][9][10].Many convolutional neural network (CNN) based architectures have been proposed to address the challenges posed by diverse organ shapes, sizes, and contrast variations.Among these, DenseNet stands out for its densely interconnected layers, offering a registration-free approach for segmentation tasks [11].Building upon DenseNet, the authors of [12] further refined the concept with a fully convolutional DenseNet specifically tailored for 2D medical image segmentation.
However, Ronneberger et al. proposed U-Net architecture that has emerged as a popular baseline in medical image segmentation [13].U-Net's encoder-decoder design has become a standard framework, inspiring numerous extensions and adaptations to tackle various segmentation challenges.Notable among these extensions are V-Net and 3D U-JAPA-Net, which extend U-Net for volumetric medical image segmentation [14,15].Additionally, Yagi et al. [16] developed a UNet based framework tailored for cancer radiotherapy support, with a focus on abdominal organ segmentation.
To enhance U-Net's performance, researchers have introduced various modifications.Wang et al. contributed Densely Connected Deep U-Net and incorporated densely connected layers to improve abdominal multi-organ segmentation [17,18].Oktay et al. integrated attention mechanisms within U-Net architecture, with attention gates at every step of the decoder in Attention UNet [19], and Nazib et al. incorporated uncertainty-based attention within the bottleneck of UNet [20].Moreover, residual connections and recurrent layers are added with U-Net architecture to feature accumulation in R2UNet [21].
Further advancements include Dense V-Net and improved U-Net architectures which utilize high connectivity between encoder and decoder [22,23].Multiple nested U-Net pathways with skip connections have been proposed to capture hierarchical features and context more effectively in UNet++ [24].Additionally, full-scale skip connections and deep supervision, along with a classification-guided module, have been integrated within U-Net for enhanced medical image segmentation in UNet3+ [25].Moreover, transformer based network TransU-Net [26], DS-TransUNet [27] and EG-TransUNet [28] are proposed to integrate both CNN and transformer based features in medical segmentation.Additionally, boundary aware segmentation networks [29], cascaded spatial shift networks [30], and multiple attention-based segmentation networks [31] have been proposed to address specific challenges in feature refinement.While these networks excel in segmenting relatively consistent and large organs, they may encounter difficulties in smaller, unevenly shaped organs with low contrast around the edges, such as the esophagus and heart [32].
Uncertainty driven organ segmentation improves the performance of medical image segmentation.Recent research has shown that uncertainty levels in convolutional neural networks may reveal segmentation issues.The authors of [33] proposed a segmentation network where they used uncertainty information.To estimate uncertainty, they needed an independent generative adversarial network.The authors of [34] proposed a segmentation network where they needed to input current CT slice, adjacent CT slice, and a prediction mask from another segmentation network to estimate uncertainty.The authors of [35] proposed a segmentation network where multiple manually segmented ground truths were required for every slice of CT image to determine uncertainty.
The conventional network suffers from over segment or under segment around boundary regions due to the similar contrast tissue of adjacent organs and inconsistent organ shape and size.Uncertainty driven deep learning networks either need multiple ground truths or separate independent networks for uncertainty map identification.
To overcome the above-mentioned issues, we have proposed a deep learning based uncertainty driven boundary refined end-to-end network for precise organ segmentation from CT images, UDBRNet, where the organs are segmented, followed by the organs' boundary refinement with the help of uncertainty information.The CT images are passed through the encoder and the main decoder produces the main mask.Whereas two parallel auxiliary decoders with features drop and random noise layer are used, respectively, for generating two auxiliary masks.Disagreement regions among output masks from multiple decoder lines are considered uncertain regions.Uncertainty information is carried out by utilizing main segmentation masks with uncertainty region data.Both the main segmentation mask and uncertain information are forwarded to the boundary refinement module to refine the boundary residuals of organs.Here, we utilize a hybrid regularizer loss function combining dice and cross-entropy due to considering both shape and entropy penalties during training.We can summarise our contributions in this paper as follows: • We propose a multi-line decoder-based segmentation module to identify uncertainty regions from single labeled dataset.This consists of one main decoder and two auxiliary decoders, one with noise addition and another with feature drop operation.
• We propose a boundary refinement network that considers uncertainty information along with a segmentation mask to refine the edges of the organs.
• With the segmentation module and the boundary refine network, we propose an end-to-end uncertainty-driven boundary-refined segmentation network, termed UDBRNet, to segment the organs from CT images.We then conduct extensive experiments on two publicly available datasets to compare UDBRNet with eight state-of-the-art segmentation networks.
The remainder of the paper is organized as follows: Section 2 focuses on Methodology; The experimental details are presented in Section 3. Experimental results of our proposed method, UDBRNet, and the existing eight state-of-the-art networks are compared for the two datasets in Section 4; Furthermore, ablation studies for evaluating the effectiveness of different modules of the proposed method are reported in Section 4.3.Finally, the overall conclusion is presented in Section 5.

Methodology
In our proposed method, we segment organs in three steps.In the first step, organs are segmented from CT images utilizing encoder decoder based architecture where one encoder and multi line decoder are comprised of one main decoder and two auxiliary decoders.Those two auxiliary decoders are incorporated to produce two auxiliary segmentation masks for identifying uncertainty regions.Then the disagreement between union and intersection of all masks is considered as uncertain region.Finally, the segmentation masks' boundaries are refined by the boundary refinement module with the help of the uncertainty information.The overall architecture of the proposed methods is illustrated in Fig 1.

Segmentation module
The design of the encoder and decoder of the segmentation module is inspired by the concept of UNet architecture [13].We implement a block defined in Eq (1) as F operation which is comprised of sequential 3 × 3 convolution, Batch Normalization followed by a ReLU activation function.In encoder, after consecutive two F operations, 2 × 2 MaxPooling operation is performed.In the first step, we make our single channel CT data into 64 channels and then the number of channels is increased twice in every step compared to the previous step in the encoder.The encoder is represented as x e in Eq (2).The output from the encoder x e is directly passed to the main decoder.Additionally, Uniformly Distributed Random Noise (UDRN) and feature drop are employed, respectively, within the encoder and decoder to create two auxiliary decoder lines, which are presented in Eq (3).The output from the encoder is fed to the corresponding decoder.In the decoders, upsampling and then two consecutive F operations are performed in every step.In this case, the number of channels in each step becomes half that in the previous step as the reverse of the encoder.In every decoder step, skip connections are added from the corresponding encoder step to retain spatial details, enhance gradient flow, and capture contextual information.Then, 1 × 1 convolution is performed and produces N  number of channel output in the last step of the decoder, where N is the number of segmentation classes.The decoder module is presented in Eq (4).The pictorial depiction of the segmentation module encoder and decoder is in Fig 2(a).After SoftMax operation of output from one main and two auxiliary decoder lines as Eqs ( 5), ( 6) and ( 7), one main Mask main and two auxiliary segmentation masks Mask aux1 and Mask aux2 are produced respectively in the output of the segmentation module.
Here, the subscript i represents the layer number of the encoders and the decoder.

Uncertainty determination module
To identify the uncertainty, firstly, a region is considered as uncertain for a particular organ if any one of the three output masks disagrees with other masks.To carry out the Disagreement region like Eq (10), the difference between union and intersection of all three output masks are considered where union represents both agreement and disagreement which is symbolized as Mask all in Eq (8) and intersection represents only agreement which is symbolized as Mask com- mon in Eq (9).The process is depicted in Fig 3 .Finally, to get Uncertainty, the Mask main is element wise multiplied with Disagreement region as Eq (11).

Boundary refinement module
In the boundary refinement module, the main segmentation mask Mask main from Eq (5) and uncertainty information Uncertainty from Eq (11) are fed and it produces residual, which is element-wise added with the main mask to refine edge for more accurate segmentation.The refinement module is comprised of two identical encoders and one decoder.The encoders are termed as y e , and z e in Eqs ( 12) and ( 13), respectively.The decoder is symbolized as y d in Eq (14).The main segmentation mask is sent to y e and uncertainty region information is passed through z e .In each encoder, at first 3 × 3 convolution layer is employed and it produces 64 channel data.After this, in every step of the encoders, sequential F and 2 × 2 MaxPolling operations are performed.Between two encoders, output from the encoder, which encodes the main segmentation mask, is passed to the decoder, where skip connections from both encoders are concatenated in the corresponding layers.In every step of the decoder, F, and bi-linear upsampling with scaling factor 2 operations are performed.Finally, according to Eq (15) the convolution is employed to get the residual, in which the number of channels is equal to the number of segmentation classes.Now the residual Mask residual is element wise added with the main segmentation mask Mask main for more accurate edge segmentation as Eq (16).The architecture of the boundary refinement module is shown in Fig 2(b).
Here, the subscript i represents the layer number of the encoders and the decoder.
We implement a loss function by combining dice loss and cross-entropy loss for regularization inspired by [36].The Mask main , Mask aux1 , Mask aux2 , and Mask refined are supervised by adding all losses before backpropagation during the training phase like Eq (17).The loss function Lð�Þ is described in Section 3.4.

Datasets
For evaluating our proposed method, we use two publicly available datasets SegThor and LCTSC.

Segthor.
There are 40 patients' CT scans with manual labeling of four organs at risk (i.e.Esophagus, Heart, Trachea, Aorta) are publicly available.The 32 patients' data were used for training, 8 patients' data were utilized for testing.In total, it contains 7390 slices of 512 × 512 images [37].
3.1.2LCTSC.It is CT scan and label dataset of 60 patients that contains five organs (i.e.Esophagus, Spinal cord, Heart, Left Lung, Right Lung) annotation.The 36 patients' data are for training and left 12 for testing and 12 for validation.In total, it contains 9,593 slices of 512 × 512 images [38].

Preprocessing
We apply an identical pre-processing pipeline for both datasets.A certain level and window size are used to improve the contrast of medical images.In this instance, window size 400 and level 30 are used in every patient's CT scan to adjust the appearance of the images more visible.Following the contrast enhancement, the region of interest for organ segmentation, which typically represents the human body, is extracted from the overall CT scan image.This phase removes irrelevant information, like the coach of the CT scanner from CT images.Once the human body part has been cropped, the three-dimensional (3D) voxel data are transformed into a series of two-dimensional (2D) images extracting each slice from the axial axis of CT scan.The image slices are resized from 512 × 512 to 256 × 256 so that they fit in the computation memory.Besides this, data are augmented with rotating, cropping, and padding.

UNet.
It is an encoder-decoder based convolutional neural network architecture widely used in biomedical image segmentation tasks.It uses skip connections to concatenate feature maps from different levels for improved information flow [13] https://github.com/milesial/Pytorch-UNet.

FC-DenseNet.
In this method, densely connected blocks extract and reuse features where each dense block links numerous layers, boosting information flow.Transition layers set feature map size and numbers as well as skip connections [12] https://github.com/SimJeg/FC-DenseNet.

UNet++.
Cascade UNet or UNet++ enhances the skip connections in the U-Net model by incorporating nested and dense skip pathways.By enhancing the skip connection, it extracts more meaningful features from its input data and it leads to better performance in segmentation [24] https://github.com/MrGiovanni/UNetPlusPlus.

BASNet.
It is a segmentation architecture that uses convolution, batch normalization, max pooling, ReLU activation, and bilinear upsampling sequentially in encoding and decoding.The backbone network captures multi-level characteristics from the input, while the boundary is refined to improve boundary segmentation [29] https://github.com/xuebinqin/BASNet.

R2UNet.
Residual Recurrent U-Net is a medical image segmentation architecture that combines U-Net structure with residual connections and recurrent layers, improving contextual information integration for enhanced segmentation accuracy [21] https://github.com/navamikairanda/R2U-Net.

TransUNet.
TransUNet is a hybrid model that combines transformer and CNN architectures, relying on self-attention processes to efficiently gather global image information.This technique improves medical image segmentation tasks by combining the features of both architectures [26] https://github.com/mkara44/transunet_pytorch.

DS-TransUNet
. DS-TransUNet integrates dense supervision and self-attention techniques in a single architecture for medical image segmentation problems.The model has robust connections between the encoder and decoder layers to enhance the flow of gradients and the transfer of information [27] https://github.com/TianBaoGe/DS-TransUNet.

Loss function
For regularization, we utilize a hybrid loss function which is comprised of cross entropy loss and dice loss.Both are very popular loss functions in particular segmentation fields and linear addition of these two losses performs better during segmentation [36].Cross Entropy loss adds a penalty for the pixel-wise prediction, which is represented in Eq (18) whereas dice loss adds a penalty for the degree of mismatch between the predicted region and the ground truth region for a particular class which is presented in Eq (19).After adding the two losses as Eq (20), the backpropagation is performed due to loss optimization.
Here, N is the number of samples, C is the number of classes, I i,c (A, B) binary indicator (0 or 1) for whether class c is the correctly identified for the i − th sample between ground truth A, and Predicted maskB, B i,c is the predicted probability that the i − th sample belongs to class c.
The Dice(A, B) is defined in Eq (21).

Evaluation metrics
To evaluate our proposed method, we use dice coefficient and Hausdorff Distance (HD) in testing for every comparing method.All the evaluations are performed on 3D data, which is generated by stacking 2D prediction masks.Eq (21) represents the dice that is used to evaluate the degree of overlap between two groups.The dice coefficient ranges from 0 to 1, where a higher value indicates a greater overlap or similarity between the predicted and ground truth masks being compared: where A and B represent ground truth and prediction mask, respectively.HD metric is a highly informative and useful metric as it serves as an indicator of the degree of dissimilarity of segmentation.It expresses dissimilarity between the boundaries of the surface of estimated and ground truth.A lower HD value signifies a higher degree of similarity, indicating better agreement between the predicted and ground truth of the segmentation mask.

Experiment design
Nine independent experiments are conducted based on eight architectures of comparing methods and our proposed method UDBRNet on two datasets (SegThor and LCTSC).For our proposed method, we use ADAM optimizer with 200 training epochs, a learning rate of 0.001, and batch size of 1.For regularization, we utilize a hybrid regularizer loss function by adding both dice loss and cross-entropy loss which is discussed in Section 3.4.

Implementation
The experiment is implemented using PyTorch 2.1.2.All the models' training and testing are performed in high-performance computing with Intel Xeron 2.40 GHz processor, 64 GB RAM, and Nvidia V100 GPU.

Result and discussion
We conducted a comparative analysis of our proposed method, UDBRNet, against eight stateof-the-art segmentation methods.UDBRNet demonstrated superior performance compared to the other methods.The qualitative results are illustrated in Figs 4 and 5. Beside this, the quantitative results are reported in Tables 1 and 2. In every table, the first row represents the segmentation architecture names, and the rest of the rows represent the organ names, corresponding dice score, and HD value with variance.A model with a higher dice score or a lower HD score is considered to have better performance than other models, which is discussed in Section 3.5.The best performing data for every organ is highlighted with bold text.

Discussion on the results for SegThor dataset
Eight comparing methods and our proposed method's experimental results on SegThor dataset are presented in Table 1.Our proposed method outperforms existing approaches on the SegThor dataset and achieves 0.80, 0.95, 0.92, 0.94 dice score and 0.81, 0.64, 0.33, 0.39 HD for esophagus, heart, trachea, and aorta, respectively which demonstrate significant enhancements in segmentation accuracy for different organs.Our approach consistently surpasses baseline Moreover, the HD values acquired using our proposed method are typically lower than those of rival models, indicating superior boundary delineation.The results indicate that our method, which incorporates uncertainty estimation, and boundary refinement, significantly improves the accuracy of segmentation and precision of boundaries.
The Figs 4 and 6 present the qualitative outcomes of the 3D and 2D organs illustration for the SegThor dataset respectively.The contouring of the ground truth and predicted results clearly demonstrates that R2Unet, FC-DenseNet under segment, and others over segment in Esophagus segmentation.In Heart segmentation, most of the methods under segment whereas our proposed method more consistently segments Heart.Trachea is under segmented in FC-DenseNet and over segmented in all other comparing methods.In the case of Aorta segmentation R2UNet, BASNet, FC-DenseNet, TransUNet perform under segmentation, and UNet and its successor UNet++ perform over segmentation.Our proposed method outperforms all other methods being compared in terms of segmentation, as we consider uncertainty during boundary refinement, which leads UDBRNet to segment organ boundaries properly and ensures higher accuracy.tissue around the edges like the Esophagus, and Heart.In this unfavorable situation, our proposed method consistently segments organs with a remarkably higher accuracy margin as we consider uncertainty during boundary refinement.

Discussion on the results for LCTSC dataset
Our suggested segmentation method has been thoroughly evaluated against several stateof-the-art techniques on both the SegThor and LCTSC datasets.While some existing methods like UNet++, DS-TransUNet show competitive performance in terms of dice scores, they often exhibit higher HD values, indicating poorer boundary localization.Whereas, by utilizing uncertainty data in boundary refinement, UDBRNet demonstrates superior ability to reliably delineate organs and consistently outperforms the benchmarked methods across several organs, such as the esophagus, heart, trachea, aorta, spinal cord, left lung, and right lung.Besides this, UDBRNet exposes less variance which indicates the stability of the network and it is necessary for medical applications.The results establish our method as a promising option for organ segmentation from CT images, highlighting its potential to advance the field of medical image analysis and contribute to improved clinical diagnoses and treatment planning.Additional qualitative visualization can be found in S1 Appendix.

Ablation studies
The ablation studies conducted on both the SegThor and LCTSC datasets provided insightful observations regarding the impact of various components within the proposed UDBRNet architecture and reported in Tables 3 and 4 for SegThor and LCTSC dataset, respectively.Only for the segmentation without boundary refinement, we employ the encoder and main decoder from the segmentation module to produce the segmentation mask.Again, for boundary refined segmentation without uncertainty data, we feed the only main segmentation mask from the segmentation module to the boundary refinement module.So, here, the uncertainty determination module and the uncertainty information encoder of the boundary refinement module are not necessary.To check the effectiveness of the auxiliary decoders of segmentation network, every combination of auxiliary decoder 1 and auxiliary decoder 2 are employed during uncertainty calculation.Furthermore, we apply Gaussian Noise (GN) and UDRN separately in our noise addition layer to show the effectiveness of the noise with our proposed network.
The baseline segmentation module exhibited moderate performance, suggesting its ability to provide initial organ segmentation.However, the incorporation of uncertainty determination and boundary refinement modules resulted in substantial enhancements in segmentation accuracy for all organs.This improvement emphasizes the vital importance of uncertainty information in directing the refinement of the edges of organs.The network exhibits similar performance when using a single auxiliary decoder, whether it is a decoder with dropped features or a decoder with added noise, for uncertainty determination.The addition of both auxiliary decoders resulted in additional enhancements in segmentation results which emphasizes the capacity of feature dropout and noise injection within the network to identify uncertain regions more rigorously to improve the resilience of the segmentation process.Moreover, the exploration of several noise types uncovered their effectiveness in organ segmentation with UDBRNet, highlighting the need to choose UDRN.The best design, which includes the integration of segmentation, uncertainty determination, and boundary refinement modules, together with both auxiliary decoders and the type of noise is UDRN, consistently achieved the maximum segmentation accuracy for both datasets.
The proposed segmentation method has the potential to be applied to other application areas where the degree of uncertainty is higher, for instance, anomaly detection in security and surveillance, inspection in robotics, and object segmentation in adverse weather conditions for self-driving cars.

Conclusion
In this work, we proposed an end-to-end uncertainty driven boundary refined segmentation architecture for medical image segmentation which consists of segmentation, uncertainty determination, and, boundary refined module.The segmentation module produces three output masks from the main and two auxiliary decoder lines.Based on disagreement among the three masks, uncertain regions are identified.Utilizing both the main segmentation mask and the uncertainty information, the boundary refinement module produces the refined segmentation mask.Our proposed method is tested on two publicly available datasets and compared with eight state-of-the-art segmentation architectures.Our method outperforms all others, specifically in organs whose size as well as shape are inconsistent and have low contrast tissue with adjacent organs.Like this, this network has the potential to segment more reliably in uncertain environments.In the future, research may be done to minimize the complexity of the underlying architecture and segment organs more precisely.

Fig 1 .
Fig 1.The overall proposed UDBRNet architecture where the segmentation module takes CT image in encoder and generates three segmentation masks from one main and two auxiliary decoders.The encoder's output is directly fed into the main decoder, while the feature drop operation for one auxiliary decoder and random noise addition operation for another auxiliary decoder are carried out before being supplied.The uncertainty determination module determines uncertainty map based on disagreement among the predicted masks from the multiple decoders.Finally, the boundary refinement module refines each organ's boundary, considering the uncertainty map and main segmentation mask.The detailed internal network architecture of segmentation, and boundary refinement module are in Fig 2 and uncertainty determination module is in Fig 3.

Fig 2 .
Fig 2. Layer architecture of the proposed method.a) Architecture of Encoder and Decoder of Segmentation Module.One encoder and three decoders (one main and two auxiliaries) are used in UDBRNet's segmentation module to produce three masks, which are used for uncertainty determination b) Architecture of Boundary Refinement Module, which takes the main segmentation mask and uncertainty information and produces mask residual.https://doi.org/10.1371/journal.pone.0304771.g002

scores 1 .
56, 0.67, 1.39, 0.60, 0.60 for esophagus, spinal cord, heart, left Lung, and, right Lung, respectively which shows substantial improvements in segmentation accuracy compared to existing methods.This indicates that our segmentation quality and border delineation are superior.Significantly, our approach outperforms baseline models such as UNet, Attention UNet, R2UNet, UNet++, FC-DenseNet, BASNet, TransUNet, and DS-TransUNet by a substantial degree, highlighting its efficacy, especially in organs with complex architecture such as the heart and esophagus, highlights the effectiveness of including uncertainty-driven boundary refinement.The qualitative results of the experiments for LCTSC are presented in Figs5 and 7which visually reveal that UDBRNet performs better than all other methods in organ segmentation.Though all the segmentation methods show close performance in Left Lung and Right Lung segmentation as the organs contain high contrast tissue around the edges, the comparing methods fail to segment properly when the organs' shapes are uneven and contain low contrast

Fig 6 .Fig 7 .
Fig 6. 2D contoured segmentation images from SegThor dataset.The red contours depict the accurate representation of the ground truth, while the green contours depict the segmentation achieved by the corresponding architecture.The left-upper corner value on each slice represents the corresponding dice accuracy.https://doi.org/10.1371/journal.pone.0304771.g006

Disagreement region determination from multiple segmentation masks which is produced from one main and two auxiliary decoders of segmentation module.
https://doi.org/10.1371/journal.pone.0304771.g003

Table 2
displays the experimental results of all approaches, including our proposed method UDBRNet, on the LCTSC dataset.It achieves dice scores 0.71, 0.89, 0.85, 0.97, 0.97 and HD